Talk by Michael Frank of Magicore Systems. Given to the Redwood Center for Theoretical Neuroscience at UC Berkeley.
Abstract Technology scaling had been carrying computer science thru the second half of the 20th century until single CPU performance started leveling off, after which multi- and many-core processors, including GPUs, emerged as the substrate for high performance computing. Mobile market implementations followed this trend and today you might be carrying a phone with more than 16 different processors. For power efficiency reasons, many of the cores are specialized to perform limited functions (such as modem or connectivity control, graphics rendering, or future neural-network acceleration) with most mainstream phones containing four or more general purpose processors. As Steve Jobs insightfully commented almost a decade ago, “The way the processor industry is going is to add more and more cores, but nobody knows how to program those things.” Jobs was correct, programming these multiprocessor systems has become a challenge and several programming models have been proposed in academia to address this issue. Power and thermals are also an ever present thorn to mass market applications. Through the years, CPUs based on the von-Neumann architecture have fended off attacks from many directions; today complex super-scalar implementations execute multiple instructions each clock cycle, parallel and out-of-order, keeping up the illusion of sequential processing. Recent research demonstrates though that augmenting the paradigm of the Von-Neumann architecture with a few established concepts from data-flow and task-parallel programming, will create both a credible and intuitive parallel architecture enabling notable compute efficiency improvement while retaining compatibility with the current mainstream. This talk will thus review the current state of the processor industry and after highlighting why we are running out of steam in ILP; I will outline the task-superscalar programming model as the “ring to rule them all” and provide insights as to how this architecture can take advantage of special HW acceleration for data-flow management and provide support for efficient neuromorphic computing.